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ABSTRACT 


A  thesaurus,  consisting  of  22,540  terms,  was  compiled  for  use 
in  indexing  and  retrieving  scientific  and  technical  intelli¬ 
gence  information.  The  thesaurus  (covering  activities  monitored 
by  Defense  Intelligence  Agency)  can  be  printed  by  computer  in 
three  forms  -  subject  structured  display,  permuted  display 
and  alphabetical  display.  The  philosophy  and  procedures  govern¬ 
ing  thesaurus  maintenance  were  also  studied. 
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THESAURUS  COMPILATION 


System  Development  Corporation  had  contractual  relations  with 
various  members  of  the  intelligence  community  which  resulted  in 
developing  thesauri  for  scientific  and  technical  intelligence 
activities.  This  work  indicated  a  sufficient  commonality  of 
interests  Between  the  community  members  to  warrant  development 
of  an  integrated  SfjT  thesaurus  to  satisfy  the  needs  of  all  DOD 
military  S6T  intelligence  activities  monitored  by  the  Defense 
Intelligence  Agency  (DIA). 

The  DuD  S5T  Intelligence  Thesaurus  is  a  compilation  of  micro¬ 
thesauri  produced  for  the  Army  Missile  Intelligence  Directorate 
(MID),  MICOM,  Huntsville,  Alabama;  the  Army  Medical  Intelli¬ 
gence  Office  (MIO) ,  Washington,  D.C. ;  the  Navy  Scientific  and 
Technical  Intelligence  Center  (STIC),  Washington  D.C.;  and 
the  Directorate  for  Scientific  and  Technical  Intelligence,  DIA 
(DIAST-2C).  It  includes  as  a  base  and  model,  the  CIRC  Thesaurus 
which  had  been  developed  at  the  Foreign  Technology  Division 
(FTD) ,  AFSC,  Wright-Patterson  AFB,  Dayton,  Ohio.  During  the 
course  of  this  contract  FTD  personnel  consoli  ated  the  CIRC 
thesaurus  with  the  Vocabulary  of  Intelligence  Concept  Expres¬ 
sions  (VOICE) ,  developed  for  the  Army  Foreign  Science  and  Tech¬ 
nology  Center  (FSTC)  ,  Washington,  D.C.  These  changes  w<  re 
forwarded  to  the  Falls  Church  Office  of  System  Development 
Corporation  (SDC)  as  they  became  available  and  integrated  into 
tie  final  thesaurus. 

Upon  receipt  of  the  CIRC  vocabulary  on  magnetic  tapes  from  FTD, 
the  data  was  converted  to  a  deck  of  cards,  listed  and  compared 
with  titles  in  subject  files  at  each  agency  and  appropriate 
interest  profiles.  The  results  of  this  comparison  was  ;he 
basis  for  a  second  subset  or  deck  of  terms  for  a  particular 
microthesaurus .  These  segments  were  then  scrutinized  in  order 
to  determine  the  most  appropriate  location  for  additional 
terminology.  This  intensive  study  revealed  situations  with 
regard  to  the  selection  and  organization  of  certain  CIRC  terms 
that  required  considerable  adjustment  before  terms  for  the 
microthesauri  could  be  added.  Duplicate  cards  in  the  micro- 
thesaurus  decks  and  the  56  subject  areas  of  CIRC  were  used  to 
assure  consistency  with  published  precedents.  Considerable 
effort  was  made  to  temporize  on  changes  in  the  interest  of 
continuity. 

The  DOD  S$T  Intelligence  Thesaurus  is  divided  into  thiee  parts; 
a  subject-structured  display,  in  which  the  terms  are  hierarchi¬ 
cally  related  within  S6  broad  subject  areas;  a  permutod  display. 
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in  which  the  single  and  multi-word  terms  are  arranged  by 
computer  in  the  order  of  each  word  that  appears  within  the  * 
terms;  and  an  alphabetical  display  in  which  the  terms  are 
listed  alphabetically  with  cross  references  among  subject- 
related  or  hierarchically  related  terms  and  tc  synonyms  where 
applicable,  and  with  scope  notes  to  indicate  the  intended  usage 
of  ambiguous  terms.  Publication  is  scheduled  for  the  fall  of 
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All  microthesaurus  terms  are  incorporated  into  the  DOT)  SP,T 
Intelligence  Thesaurus.  However,  there  are  slight  differences 
in  structuring  between  the  micro- thesauri  and  the  thesaurus  to 
be  published  by  FTD.  These  differences  arise  because  of  program 
developments  at  FTD  during  the  contract  period.  The  micro- 
thesauri  listings  are  being  produced  at  a  local  facility  and 
utilize  a  1968  version  of  thi  Dayton  computer  program.  Con¬ 
sequently,  they  are  structured  in  a  format  identical  to  the 
last  published  CIRC  thesaurus.  Input  material  to  the  contract 
thesaurus  has  been  prepared  for  a  1969  version  of  the  same 
computer  program  which  provides  somewhat  increased  capabilities 
for  term  display. 


A  statistical  comparison  of  the 
1968  CIRC  Thesaurus  and  the  DOD 
shown  below. 


various  entries  in  the  December 
S5T  Intelligence  Thesaurus  is 


December  1968  July  1969 

CIRC  DOD  S6T 

Thesaurus  Intel  Thesaurus 


Official  Terms  (OT)  12,999  15,427 

Official  Term  Synonyms  (SY)  1,010  2,071 

Official  Nomenclature  Terms  (ONT)  4,42S  4,509 

Official  Nomenclature  Term  302  533 

Synonyms  (ONT-SY)  _ 


Total  entries  18,736 


22 , 540 


These  figures  reveal  a  net  increase  of  3,904  entries  in  the 
size  of  the  vocabulary.  Actually,  the  total  change  in  the 
vocabulary  is  far  greater  since  the  consolidation  of  the  micro¬ 
thesauri  required  many  term  substitutions  and  revisions,  as 
well  as  extensive  reorganization  of  the  vocabulary  in  several 
areas. 
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APPENDIX  I 


STUDY  AND  ANALYSIS  OF  MAINTENANCE  PROCEDURES  IN  CIRC  OPERATIONS 

SECTION  I 

REVIEW  OF  THE  CIRC  THESAURUS  MAINTENANCE  PROCEDURES 


INTRODUCTION 

Pursuant  to  a  task  requirement  of  Contract  F10602-68-C-Q252  (PR- 1-8-4466 ) >  a 

study  was  made  of  applicability  of  the  present  vocahulary  support  effort  at  FTD 

to  an  expanded  CIRC.  Recommendations  are  made  for  systematic  maintenance  of  j 

the  DOD  Scientific  and  Technical  Intelligence  Thesaurus.  1 

* 

< 

An  overview  of  the  present  FTD  vocabulary  support  system  is  given.  A  suggested 
plan  is  presented  for  enlarging  the  scope  of  present  support  system  procedures 
to  accommodate  new  users  and  an  extended  vocabulary.  The  recommended  operating 
procedures  mirror  the  present  system  with  few  innovations.  It  is  hoped  that 
sufficient  detail  and  references  are  presented  to  provide  the  basis  for  a 
primer  or  handbook  for  new  activities  entering  the  CIRC  system. 

An  investigation  was  made  of  a  new  approach  to  vocabulary  support  within  the 
CIRC  system.  The  techniques  are  not  new  and  car.  be  seen  during  any  demonstra¬ 
tion  of  COLEX  or  a  time-shared  management  system,  basically,  the  approach  is 
to  bring  a  direct-access  automatic  updating  capability  to  the  lexicographic 
function.  The  project,  called  AUTOLEX  .tor  automating  lexicographic  functions), 
was  initiated  in  order  to  derive  some  basic  conclusions  and  recommendations  for 
further  work.  The  project  is  described  in  Section  II.  , 

METHODOLOGY 

The  review  of  the  present  vocabulary  support  system  at  FTD  was  accomplished 
through  direct  conversations  with  the  lexicographic  group  (TDBAC) ,  members  of 
the  Request  Center  (TDBIR) ,  the  internal  indexing  group  (TABAC-2).  In 
addition,  the  experience  and  knowledge  of  the  SDC  Dayton  Office  was  utilized 
by  direct  contact  and  review  of  pertinent  documentation. 

It  was  decided  that  the  AUTOLEX  project  would  best  be  demonstrated  utilizing 
a  CIRC-lfke  data  base  operated  on  by  the  direct  access  capability  of  the  SDC 
Time-Shared  Data  Management  System  (TDMS) .  TDMS  was  chosen  over  the  On-line 
Retrieval  of  Bibliographic  Information  Time-shared  (ORBIT)  System  only  because 
the  up-date  capability  of  ORBIT  was  not  complete  at  the  time  of  the  study. 

CONSTRAINTS 

At  no  time  was  ibe  &oal  of  this  effort  considered  as  license  to  redesign  the 
entire  vocabulary  support  system.  To  opt  otherwise  would  generate  massive 
changes  in  a  number  of  production  pr  grains  because  of  the  vocabulary  dependen¬ 
cies.  Rather  the  aim  of  this  effort  focused  on  finding  the  least  disruptive 
approach,  in  terms  of  resources  expended,  to  integrating  the  microthesauri  into  5 

an  expanded  CIRC  while  providing  the  new  activities  with  sufficient  means  to 
maintain  a  current  vocabulary* 
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Only  those  lexicographic  functions  which  deal  directly  with  the  interaction  of 
users  and  the  lexicographic  group  (TDBAC)  were  considered  for  investigation-  a 
decision  which  affected  the  detailing  of  problem  areas  encountered.  Specific 
information  on  equipment  changes  became  known  prior  to  publication,  but  after 
this  investigation  had  completed  data  collection  phases.  Since  the  study  was 
undertaken  while  the  entire  CIRC  system  was  anticipating  an  equipment  change¬ 
over  from  an  IBM-7094  to  an  IBM0360/65,  it  was  assumed  that  conversion  to 
third-generation  equipment  would  necessitate  alterations  to  the  present  oper¬ 
ational  format  for  vocabulary  update  and  support. 

PRESENT  SYSTEM  -  OVERVIEW 

The  operating  environment  of  the  lexicographic  group  is  depicted  in  Figure  Al. 
Communications  with  the  user  groups  may  occur  through  information  specialists 
at  the  Request  Center  or  come  about  directly  from  the  user  to  the  lexicographic 
group.  The  indexer  to  lexicography  office  channel  has  always  been  one  of  a 
direct  exchange  between  the  lexicographic  group  and  a  single  source  at  each 
Indexing  unit.  The  following  sections  describe  those  lexicographic  functions 
which  relate  directly  to  the  transfer  of  information  from  the  user  to  the 
lexicography  office. 

VOCABULARY  CHANGE 

The  CIRC  vocabulary  id  meant  to  be  highly  "user  oriented".  Direct  suggestions 
from  indexers  and  users  are  vital  to  insure  a  current  and  comprehensive 
vocabulary.  At  present,  user  suggestions  for  vocabulary  change  take  place 
through  the  use  of  a  Term  Control  Fom\  FTD-O-87  (see  Figures  A2  and  A3.  One 
copy  of  this  four-part  form  is  kept  in  suspense  at  the  lexicography  office; 
the  remaining  copies  are  forwarded  to  the  lexicography  office  through  quality 
control  points.  Final  disposition  of  the  request  is  communicated  to  the 
initiator  by  means  of  copies  of  the  same  form.  The  form  itself  is  adequate 
for  change  requests  and,  provided  the  instructions  are  followed  correctly,  no 
further  information  is  required. 

A  CIRC  term  control  file  is  maintained  within  the  lexicography  office.  Figure 
A4,  shows  the  log-in  procedures  and  duplication  check  made  on  all  incoming 
term  control  forms.  The  term  control  file  consists  of  suspense  and  action 
subfiles.  The  action  file  depicts  all  dispositions  made  on  requested  changes 
to  the  vocabulary  and  is  represented  by  the  yellow,  or  fourth,  copy  of  the 
form.  The  suspense  file  represents  those  requests  not  yet  processed  and  con¬ 
sists  of  copies  1,  2,  and  3  of  the  form.  Both  files  are  maintained  in  alpha¬ 
betical  sequence  by  term. 

Incoming  requests  are  checked  against  the  suspense  file  for  like  occurrences 
originating  from  another  activity  and  awaiting  disposition.  When  the  incoming 
request  is  found  to  be  repetitive  with  a  prior  request,  the  new  form  is  clipped 
to  the  old  one  and  returned  to  the  suspense  file.  When  file  disposition  is 
made  on  the  term  in  question,  all  activities  requesting  the  term  change  are 
notified. 

When  the  check  on  the  suspense  file  is  negative,  the  action  file  is  searched. 

If  the  alphabetic  search  proves  fruitless,  the  form  copies  are  placed  in  proper 
sequence  in  the  suspense  file.  When  it  is  found  that  the  term  has  had  prior 
ation,  items  6-10  of  the  third  copy  are  completed  from  the  previous  disposition 
a».d  returned  to  the  latest  originator. 
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Fig.  Al  CIRC  Lexicographic  Operating  Environment 


Pig.  A2  Term  Control  For* 
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Changes  initiated  within  the  lexicographic  section  are  recorded  only  on  the 
yellow  copy  of  the  form  These  forms  are  placed  la  Che  suspense  file  until 
they  are  processed. 

The  present  policy  is  to  maintain  the  suspense  file  Cor  at  least  six  months 
before  processing  the  requests  into  machine-readable  form  necessary  for  a 
•  .  vocabulary  update.  The  delay  was  caused  by  the  lov  priority  given  to  machine 

processing  for  maintaining  an  up-to-date  vocabulary.  Production  and  statistics 
dominate  the  operational  environment  within  FTD.  A  candidate  term  of  the 
suspense  file  must  have  the  following  characteristlca : 

1.  Uniqueness  relative  to  the  other  terms  in  the  vocabulary. 

2.  Application  to  an  accepted  interest  area, 

3.  Responsiveness  to  a  legitimate  demand. 

When  a  term  is  rejected  by  the  lexicographic  section  Item  six  of  the  Term 
Control  Form  is  completed.  One  copy  la  retained  in  the  lexicographic  action 
file;  and  the  remaining  copies  are  returned  to  the  originators. 

Item  six  is  also  completed  for  candidate  terms  which  are  accepted ;  copies  1 
and  3  are  returned  to  the  originating  agency.  The  lexicography  copy  is 
keypunched  before  being  retired  to  the  action  file.  Figure  A5  represents  the 
general  decision  making  process  for  disposition  of  tens  in  the  suspense  file. 

What  has  been  described  is  the  procedure  now  used  within  the  lexicography  sec¬ 
tion  for  input,  processing,  and  final  disposition  of  requested  term  changes  or 
additions  to  the  CIRC  thesaurus.  Next  will  be  discussed  the  means  presently 
available  to  users  and  indexers  for  access  to  the  aost  current  and  comprehen¬ 
sive  version  of  the  CIRC  thesaurus. 

VOCABULARY  CHANGE  ANNOUNCEMENTS 

A  most  ioportant  aspect  of  lexicographic  responsibility  is  the  timely  announce¬ 
ment  to  the  user  group  of  all  additions  and  modifications  to  the  system 
vocabulary.  Maximum  return  can  be  expected  if  the  changes  ere  displayed  already 
assimilated  into  the  groups  and  hierarchical  structure  of  the  Thesaurus.  At 
present,  a  revised  edition  of  the  CIRC  Thesaurus  is  published  annually.  This 
is  a  three-volume  publication  that  displays  the  vocabulary  in  thrse  formats,  a 
structured  vocabulary  (VOCSS),  a  permuted  index,  ad  an  mlphmbetixed  vocabulary 
(THESA).  The  high  cost  of  producing  the  tbessurus  prevents  more  frequent  pub¬ 
lication  and  distribution. 

At  six-month  intervals,  an  updated  permuted  Index  Is  produced  and  distributed. 
This  publication  is  regarded  by  users  as  tbs  most  usmful  of  the  computer- 
produced  tools  for  indexing  and  retrieval.  In  tbe  past,  typewritten  llste  of 
all  changes  end  modifications  to  the  vocabulary,  with  the  chmngee  keyed  to  the 
proper  pages  of  the  permuted  Index  were  furnlahed  periodically.  Reaction  of 
the  user  groups  to  this  method  of  announcement  was  ext  rases  ly  negative  because 
of  the  need  to  post  changes  manually  on  the  lateat  printed  formats  in  order 
to  maintain  a  current  and  comprehensive  display  of  the  vocabulary.  In  addition, 
ten  additions  to  profiles  are  reported  beck  to  the  requestor. 
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Other  tools  produced  by  the  computer  support  system  are  geared  to  the  lexico¬ 
graphic  group  only.  Listings  showing  over-posted  terms,  frequency  counts, 
rejected  terms,  unofficial  nomenclature,  profile  information,  etc.,  are  printed 
on  demand  or  as  a  result  of  a  programmed  update  of  the  vocabulary.  These  tools 
are  most  important  to  the  lex  •' cographer ,  who,  in  order  to  make  sound  judgments 
on  new  candidate  terms,  must  have  at  his  disposal  all  possible  information 
concerning  the  existing  thesaurus. 

PROBLEM  AREAS 


Some  problems  steaming  from  the  consolidation  of  the  microthesauri  into  an 
expanded  CIRC  vocabulary  and  the  Impact  of  additional  users  into  the  present 
system  of  vocabulary  control  and  update  should  be  anticipated.  It  is  difficult 
to  determine,  at  this  point,  what  Influence  the  new  equipment  configuration 
will  have  on  existing  problem  areas.  The  effect  could  be  a  complete  revision 
of  system  operation  or,  at  the  very  least,  a  shift  in  policy  from  undue  concern 
with  production  figures  to  a  more  realistic  proportion  of  priority  given  to 
system  maintenance. 

The  execution  of  this  contract  will  reduce  the  vocabulary  maintenance  backlog. 
For  instance,  a  major  function  of  the  lexicographic  section  is  thesaurus 
review.  In  view  of  the  insufficient  staff  and  the  lack  of  computer  support  at 
FTD,  this  would  be  massive  undertaking.  Preliminary  estimates  of  new  unique 
terms  to  be  added  to  the  thesaurus  were  as  high  as  10,000  terms. 1  This  would 
be  a  monumental  task  if  it  were  assigned  to  the  lexicographic  section  on  top  of 
normal  work  loads.  Under  the  provisions  of  the  contract,  however,  a  group  of 
experienced  lexicographers  has  been  assembled  to  work  with  FTD  in  insuring  the 
validity  and  proper  cross  referencing  of  each  term  in  each  microthesaurus. 

This  effort  should  uncover  and  correct  many  of  the  discrepancies  which  presently 
exist  in  the  thesaurus. 

The  areaa  of  concern  to  this  study  are  those  most  apt  to  endanger  a  simple 
transition  to  an  expanded  CIRC  and  the  Introduction  of  new  S6T  intelligence 
activities  to  the  CIRC  system.  The  following  points  are  considered  critical  to 
FTD  in  its  responsibility  as  the  Executive  Agency  of  DIA. 

1.  The  lack  of  proper  guidelines  and  clearly  defined  standard  operating 
procedures  for  allowing  new  activities  to  enter  the  CIRC  system. 

2.  Insufficient  manning  levels  within  the  lexicography  office  to  maintain 
the  vocabulary  effectively. 

3.  Insufficient  support  from  the  computer  facility,  which  precludes 
thesaurus  updates  on  a  monthly  basis  and  the  production  of  vocabulary 
change  announcements.  This  lack  of  operational  support  to  tha  quality 


The  actual  growth  by  about  3800  terms  was  below  ec time tea  because  many  candidate 
terms  failed  to  meet  the  criteria  of  being  conceptually  unique,  and  their 
Introduction  would  have  impaired  rather  than  aided  utility. 


11 


w-urpii  v  „r,. 


31  July  1969 


TM-WD-  (D-321/000/00 


control  function  ia  a  prime  factor  in  preventing  the  Thesaurus  from 
becoming  truly  current  and  comprehensive. 

RECOMMENDATIONS 

Of  primary  concern  Is  the  need  at  every  new  activity  for  CIRC  system  documen¬ 
tation,  operating  procedures  for  interaction  with  the  system,  and  technical 
advice  as  to  the  creation  of  a  workable  in-house  CIRC  user  unit.  In  order  to 
meet  these  needs,  it  is  recommended  that: 

1.  Each  activity  agency  create  an  information  specialist  desk  which  will 
he  the  quality  control  point  for  all  requests  from  analysts  to  the 
CIRC  system. 

2.  The  agency  information  specialist  vLll  also  be  responsible  for  quality 
control  of  all  In-house  Indexing  in  order  to  improve  its  efficacy  and 
consistency  with  that  of  other  activities. 

3.  The  information  specialist  act  as  the  single-point  liaison  for  all 
communication  between  the  FT fi  lexicographic  section  and  the  participa¬ 
ting  unit . 

4.  The  Information  specialist  shall  maintain  an  action  and  suspense  file 
on  all  Term  Control  Forms  originating  from  his  organization. 

5.  The  FTD  lexicographic  section  devise  a  training  program  for  the  in¬ 
formation  specialist  of  each  new  participating  agency;  such  program 
to  include  a  thorough  briefing  on  system  operation  and  system  respon¬ 
sibilities. 

6.  The  FTD  lexicographic  section  prepare  a  package  of  guidelines  and 
system  documentation  designed  to  orient  new  activities  to  their  re¬ 
sponsibilities  in  preparing  themselves  to  interact  with  the  CIRC 
system. 

7.  The  FTD  lexicographic  section  prepare  the  necessary  Justification  for 
computer  support  which  will  allow  for  monthly  updates  of  the  CIRC 
thesaurus . 

8.  The  FTD  lexicographic  section  prepare  for  distribution  to  the  users 
the  necessary  support  to  allow  a  monthly  supplement  of  thesaurus 
changes  to  be  produced  in  the  form  of  a  cumulative  permuted  index  of 
changes,  month  by  month,  until  the  semi-annual  publication  of  the 
complete  permuted  index  is  released. 

9.  An  investigation  be  undertaken  of  the  application  of  on-line  methods, 
particularly  operating  data  management  systems,  to  lexicographic 
functions.  (See  Section  II  for  further  information) . 
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10.  In  order  to  properly  carry  out  the  responsibilities  of  liaison, 

instruction  and  vocabulary  control,  the  FTD  lexicographic  section  be 
increased  permanently  to: 


1  Supervisor  (1939) 

1  Secretary  (GS/5) 

2  Lexicographers,  Sr.  (1156) 

2  Lexicographers  (1373) 

2  Clerks  (GS4/5) 
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APPENDIX  I 

STUDY  AND  ANALYSIS  OF  MAINTENANCE  PROCEDURES  IN  CIRC  OPERATIONS 

SECTION  II 
AUTOLEX 


BACKGROUND 

Since  'he  decision  was  made  that  the  Centralized  Information  Reference  and 
Control  (CIRC)  system  at  the  Foreign  Technology  Division  of  AFSC  mas  to  be  the 
basts  for  a  DOD  wide  SAT  intelligence  information  processing  system,  opera¬ 
tional  capabilities  have  been  improved  to  enhance  the  services  to  Che  whole 
intelligence  community.  A  case  in  point  is  the  DIA-sponsored  experiment 
allowing  analysts  throughout  the  intelligence  community  direct  access  to  a 
CIRC-like  data  base  through  remote  teletype  consoles.  The  CIRC  On-Line  Experi¬ 
ment  (C0LEX)  .proved  successful  and  resulted  in  DIA  approval  to  go  operational 
with  the  CIRC  On-Line  (C1RC0L)  retrieval  system. 

The  C0LEX  experience  also  underlined  a  basic  requirement  for  a  carrent  and 
comprehensive  thesaurus  reflecting  all  terms  and  their  relationships  required 
by  each  organization  and  so  structured  as  to  satisfy  the  needs  of  indexing, 
retrieval,  and  dissemination  throughout  the  intelligence  cnimiunity.  To  meet 
thi3  requirement,  DIA  directed  an  effort  to: 

1.  Create  a  computerized  scientific  and  technical  intelligence  thesaurus 
which  would  represen*  the  combined  interests  of  all  SAT  intelligence 
organizations. 

2.  Provide  a  means  of  systematic  updating  and  control  of  terminology  In 
this  thesaurus. 

Persuant  to  the  second  goal,  there  are  indications  that  other  means  of  opera¬ 
tional  support  of  the  thesaurus  should  be  Investigated.  This  paper  describes 
an  exploration  of  the  merits  of  automating  lexicographic  (AUTOLEX)  functions 
within  the  working  environment  of  a  jTime-jShared  Data  Management  System  (TDMS.) 

SCOPE  OF  THE  PROJECT 

The  AUTOLEX  project  was  designed  to  provide  operational  data  on  a  direct-access 
capability  for  lexicographic  functions.  The  project  was  concerned  specifically 
with  applying  operational  direct-access  techniques  to  a  large  machine-file 
thesaurus  for  the  purpose  of  update  and  modification. 

In  the  interest  of  economy,  the  already  operational  TDMS  was  chosen  as  the  test 
vehicle  and  a  test  data  base  was  created  which  resembled  as  closely  as  possible 
the  CIRC  thesaurus  master  file  in  subject-ares  order.  The  test  data  base  con¬ 
tained  only  terminology  in  basic  sciences  so  that  no  compromise  of  sponsor  or 
parochial  interests  could  be  implied. 
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TDMS  ENVIRONMENT 

The  AUTOLEX  project  was  envisioned  as  requiring  a  dialogue  capability  (through 
CRT  or  console)  that  would  permit  a  lexicographer  to  query,  review,  modify, 
delete,  and  update  the  files.  In  addition,  listings  in  specified  formats  could 
be  obtained.  However,  for  the  purposes  of  the  test,  a  minimum  requirement  of 
scanning,  selecting,  and  updating  Official  Terms  was  decided  upon.  The  TDMS 
system,  though  incomplete  at  the  time,  had  the  computer  programs  necessary  to 
meet  the  minimum  test  requirements. 

DATA  BASE  DESCRIPTION 

The  first  ten  subject  areas  (representing  about  J300  Official  Terms)  of  the 
CIRC  thesaurus  were  chosen  for  the  test  data  base.  A  fairly  simple  group  of 
data  elements  was  selected  and  recorded  In  a  format  required  by  TDMS.  More 
sophisticated  test  data  would  not  have  contributed  measurably  to  the  realism 
of  the  simulation. 

The  following  elements  were  chosen: 


Element  Name 

Description 

1. 

TERM  DESIGNATOR 

OT,  ONT,  SA,  SN 

2. 

SUBJECT  CODE 

1  through  56 

3. 

TERM  LEVEL 

1  through  5 

4. 

COSATI  CODE 

Four  digit  code  from  TEST  or 
COSATI  based  sources 

other 

5. 

OFFICIAL  TERM 

6. 

TERM  CODE 

Numerical  position  in  FTD  file 

7. 

COSATI  LEVEL 

1  through  5 

8. 

COSATI  TERM  CODE 

Numerical  position  in  COSATI 

file 

9. 

FREQUENCY  COUNT 

NUMBER  OF  POSTINGS 

10. 

PROFILE  DATA 

11. 

USER  CODE 

ACTUAL  UNIT  DESIGNATORS 

12. 

AGENCY  DATA 

13. 

AGENCY  CODE 

TERM  IS  UNIQUE  TO.... 
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TDMS  COMPONENTS 

The  following  operations  were  utilised  during  the  test  phase: 

1.  The  Define  operation,  which  is  necessary  to  describe  the  data  elements 
to  the  system  (Fig.  Ah). 

2.  The  Generate  (Load)  which  is  used  to  create  a  TDMS  data  base  or  to  add 
entries  to  an  existing  data  base  (Fig.  A7). 

3.  The  guerv_  operation,  which  permits  the  retrieval  and  printing  of 
selected  data  from  a  specified  data  base.  A  limited  capability  for 
caicvslat  lens  is  provided  (Fig.  A8). 

4.  The  Update  operation,  which  allows  changes  to  existing  data  values  and 
the  addition  or  deletion  of  data  values  or  entire  logical  entries  to 
and  from  a  data  base  (Fig.  A9). 

5.  The  Compose  operation,  which  is  designed  to  permit  a  user  to  describe 
the  format  of  i  report  for  subsequent  generation  (Fig.  A-10) . 

PROJECT  DIARY 

A  duplicate  deck  was  produced  from  the  existing  thesaurus  master  file.  The 
elements  represented  were  subject  code,  level  code,  and  official  term.  All 
additional  elements  required  were  arbitrarily  created  and  recorded  on  work¬ 
sheets  for  keypunching  on  the  existing  cards. 

On  the  advice  of  the  TDMS  maintenance  staff  at  Falls  Church,  it  was  decided  to 
rut.  the  Define  and  Load  operations  at  Santa  Monica  to  take  advantage  of  the 
experience  of  the  staff,  to  eliminate  line  charges  for  Falls  Church  access, 
and  to  save  time  in  the  creation  of  the  data  base. 

The  Define  operation  gave  no  problems;  however,  the  Load  operation  caused  some 
difficulties.  Two  days  were  spent  in  correcting  data  base  errors  brought  about 
by  misinterpretation  of  TDMS  user  guides.  Then,  with  the  data  base  in  fairly 
good  order,  the  next  four  days  were  spent  in  trying  to  complete  the  Load 
operation  under  the  time  sharing  system.  As  it  turned  out,  the  mean-time-to- 
failure  of  the  time  sharing  system  (with  10  to  20  users)  was  less  than  the  time 
required  to  complete  the  Load  operation.  This  necessitated  the  acquiring  of 
the  computer  on  a  special  request  to  operate  with  no  other  users.  Under  these 
conditions  the  load  time  took  twenty  minutes. 

The  remaining  TEWS  components  were  tested  at  Falls  Church  via  the  remote 
teletype  hook  up  with  the  Santa  Monica  computer  facility.  The  Query  and  Update 
components  were  run  exhaustively,  but  little  time  was  realized  oq  the  Compose 
component  because  of  its  state  of  completion  at  the  time, 

CONCLUSIONS 

For  a  number  of  reasons,  the  AIJTOLEX  project  was  not  given  a  true  test.  In  s 
technical  sense,  the  environment  lacked  the  crucial  interaction  with  the  pro¬ 
duction  side  of  the  CIRC  operation.  The  only  proof  of  success  in  updating  the 


16 


31  July  1969 


IM-WD-(L) -321/000/00 


v 


/login  msdhourly  30 
LOGGED  IN  10:12  10/30/68 
/define 
PGM. STARTED 

DEFINE 'VERSION  5.2  IS  OPERATING 

ENTER  'DEFINE  OR  REDEFINE' :  define 

DATA  BASE  NAME  IS:  alex 

ENTRY  TERMINATOR  IS:  end 

ENTER  'TTY'  OR  INPUT  FILE  IDENTIFIER:  tty 

BEGIN  INPUTS 

1  term  designator  (name) 

NEXT 

2  subject  code  (number)  values  are  01... 56 
NEXT 

3  term  level  (number  values  are  1...5 
NEXT 

4  cosati  code  (number)  format  is  0999  values  are  0100... 2204 
FORMAT  ACCEPTED. 

NEXT 

5  official  term  (name) 

NEXT 

6  term  code  (number) 

NEXT 

7  cosati  level  (number)  values  are  1...5 
NEXT 

8  cosati  term  (number) 

NEXT 

9  frequency  count  (number)  values  are  0...3999 
NEXT 

10  profile  data  (repeating  group) 

NEXT 

11  user  code  (name  in  10) 

NEXT 

12  agency  data  (repeating  group) 

NEXT 

13  agency  code  (name  in  12) 

OUTPUT  FILE  IDENTIFIER  IS:  dale* 

FILE  DALEX,  2314  VOL.  9001  OPENED  ON  AO 
DEFINE  OPERATION  SUCCESSFULLY  COMPLETED 
DEPINE  VERSION  5.2  IS  OPERATING 
ENTER  'DEFINE  OR  REDEFINE':/  quit  define 
PGM. QUIT 


(Note  -  lower  case  text  were  input  at  console;  upper  case  text  vas 
system  response) 

Fix.  A6  AUTOLEX  Define  Example 
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/generate 
PGM. STARTED 

GENERATE  VERSION  5.2  IS  OPERATING 
ENTER  OUTPUT  FILE:  autolex  v40346 
NEW/ADDON/RESTART:  new 
2314  VOL  0346  10  BE  MUTED  ON  A6 
FILE  AUTOLEX,  2314  VCL.  0346  OPENED  ON  A6 
ENTER  DESCRIPTION  FILE:  dalex  v40346 
t  ENTER’ TTY’  OR  INPUT  DATA  FILES:  autape  v95983  S2 

THE  DATA  BASE}  NAME  IS  'ALEX'  . 

TAPE  VOL  5983  TO  BE  MNTED  ON  CA 
NEXT 

trace  100 

NEXT 

run 

FILE  LCONELEX,  2314  VOL.  9003  OPENED  ON  A2 

FILE  LFAILLEX,  2314  VOL.  9001  OPENED  ON  AO 

FILE  CFINDLEX.  2314  VOL.  9002  OPENED  ON  A1 

FILE  LNAMELEX,  2314  VOL.  9003  OPENED  ON  A2 

'100'  ENTRIES  PROCESSED 

'200'  ENTRIES  PROCESSED 

'300'  ENTRIES  PROCESSED 

'400'  ENTRIES  PROCESSED 

'500'  ENTRIES  PROCESSED 

'600'  ENTRIES  PROCESSED 

'700*  ENTRIES  PROCESSED 

'800'  ENTRIES  PROCESSED 

GENERATE  OPERATIONS  SUCCESSFULLY  COMPLETED 


Fig.  A7  AUTOLEX  Generate  Example 
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STATED 

QIJKjC'  VERSION  5.2  IS  OPERATING 
mniH  DATA  BASE  FILE  IDEiSTIFIER:  autolex  *40346 
■;?VU  VOL  0346  TO  EE  MNTED  ON  B4 

wr 


•;  rteacribe 


Cl 
C2 
C3 
C4 
C5 
C6 
C7 
C8 
C9 
CIO 
Cll 
Cl  2 
C13 


TERM  DESIGNATOR  (NAME) 

SUBJECT  CODE  (NUMBER)  VALUES  ARE  01... 56 
TERM  CODE  (NUMBER)  VALUES  ARE  1...5 

COSATI  CODE  (NUMBER)  FORMAT  IS  0999  VALUES  ARE  0100.... 2204 
OFFICIAL  TERM  (NAME) 

TERM  CODE  (NUMBER) 

COSATI  LEVEL  (NUMBER)  VALUES  ARE  1...5 
COSATI  TERM  (NUMBER) 

FREQUENCY  COUNT  (NUMBER) 

PROFILE  DATA  (REPEATING  CROUP) 

USER  CODE  (NAME  IN  10) 

AGENCY  DATA  (REPEATING  GROUP) 

AGENCY  CODE  (NAME  IN  12) 


/show  c6 


VI 

Vlll 


1 

222.0 


SEARCH:  $ 

print  an try  where  c6  gq  150 

C1-0T  C2-4  C3-1  C4-502  C 5-LIBRARY  C6-222.0  (C12)C1>FTD  (C12)C13-STIC 

print  entry  where  c9  gq  700  ^ 

C1-0T  C2-4  C3-1  C4-509  C5- SCIENTIFIC  PERSONNEL  C6-(k  ^  -7*5  (Ci^Cil-AMD 
(C12)C13-FTD  (C12)C13-3TIC 

Cl-OT  C2-4  02  C4-501  C5-TECHNICAL  ASSISTANCE  C6-93  V-  714  (C10)C11-DPCE/H 
(C12)C13-FTD  (012)013- STIC 

/quit  query  v 

PGM. QUIT 


Fig.  R  AUTOLEX  Query  Example 
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/update 
PGM. STARTED 

UPDATE  VERSION  5.2  IS  OPERATING 

ENTER  DATE  BASE  FILE  IDWTIFIER:  autolex  v40346 

DATA  BASE  NAME  IS  'ALEX',  MOD  14,  DATE  12/10/68 

USE  IT,  'Y/N':  y 

NEXT 

print  entry  ehere  c9  eq  765 

Cl *OT  C2>4  C3-1  C4-509  C5-SCIENTIFIC  PERSONNEL 

C6-86  C9-765  (ClO)Cll-AMD  (C12)C13*FTD  (C12)C13-STIC 

, dd  profile  data,  user  code  ■  jjb  where  c9  eq  765 

’ \,  ENTRIES,  '1'  OCCURRENCES  SELECTED,  'EX/MORE/S': 
ex  1 

ne:  V 

pri  \  sntry  where  c9  eq  765 

C1«0\  y*4  C3* 1  C4-509  C5-SCIENTTPIC  PERSONNEL  C6-B6  C9-765  (ClO)Cll-JJB 
'  (ClO)dl-AMD  (Cl')Cl 3«FTD  (C12)C13)-STIC 

add  pro'  le  data,  uaer  code»JJb  where  c9  eq  714 

'1'  ENTR7  IS,  ’1'  OCCURRENCES  SELECTED,  'EX/MORE/S'  : 
ex 

NEXT 

print  '  r  where  c9  eq  714 

Cl-OT  C?  '>  .  --2  C4-501  C5-TECHNICAL  ASSISTANCE  C6-92  C9-J14  (ClO)Cll-JJB 
iMC.10)Cll-DPCE/H  (C12)C13»FTD  (C12)C13«STIC 


/quit  updAt^ 
PGM. QUIT 
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/compose 


PGM. STARTED 

COMPOSE  VERSION  5.2  IS  OPERATING 

ENTER  DATA  BASE  FILE  IDENTIFIER ?  autolex  v40346 

ENTER  COMPOSE  FILE  IDENTIFIER  OR  'NONE': 


none 

NEXT 

qualify  entry  where  c9  gq  500 
NEXT 

title  Is  unclassified 
’Tl'  ACCEPTED 


heading  is  subj  area,  level,  term,  freq  cnt 

'HI'  ACCEPTED 

NEXT 

content  is  subject  code,  term  level,  official  term,  c9 


'Cl'  ACCEPTED 
NEXT 

PUT  TOP  tl,  hi 

NEXT 

run 

REPORT  GENERATION  HAS  BEGUN 
DATA  BASE  NAME  IS. . .  ALEX 
DATE  OF  REPORT  GENERATION  IS... 


SUBJ  AREA  LEVEL 

4  1 

4  1 

4  2 

4  3 

4  1 


12/11/68 

UNCLASSIFIED 

TERM 

SCIENTIFIC  PERSONNEL 
SCIENTIFIC  RELATION 
TECHNICAL  ASSISTANCE 
FOREIGN  TECHNICAL  ASSISTANCE 
SCIENTIFIC  CONFERENCE 


FREQ  CNT 
765 
615 
714 
502 
566 


/quit  compose 
PGM. QUIT 


Fig.  A10  AUTOLEX  Compose  Example 
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thesaurus  would  be  the  successful  operation  of  the  files  by  retrieval  and 
disseminating  functions  in  real  time  with  usual  volumes.  Also,  the  limitations 
of  TDMS  (it  was  not  created  as  a  lexicographic  tool)  prevented  a  realistic 
demonstration  of  providing  direct  access  to  a  machine-file  thesaurus.  For  in¬ 
stance,  each  TDMS  operation  must  be  "called  in"  and  "quit"  before  any  other  can 
be  brought  to  play,  resulting  in  an  undesirable  discontinuity. 

The  number  of  elements  chosen  for  the  entry  description  was  reduced  to  save 
time  in  creating  the  data  base,  and  was  not  analogous  to  that  encountered  in 
an  operational  situation.  Despite  these  difficulties,  the  test  results  justify 
continuance  of  the  effort.  The  results  show  that  on-line  processing  will 
improve  existing  thesaurus  maintenance  capabilities  in  the  following  ways: 

1.  Terms  can  be  added  to  the  thesaurus  at  any  time.  Updatings  more 
frequently  chan  on  the  presenc  six  month  cycle  would  mean  that  the 
vocabulary  would  be  "correct"  at  any  given  time.  Crash  programs  to 
meet  deadlines  for  updating  runs  would  never  bn  necessary, 

2.  The  time  lag  would  be  reduced  between  the  submission  of  suggested  terms 
by  users  and  the  availability  of  those  terms  to  indexers  and  searchers. 

3.  The  complex  filing  system  could  be  simplified,  freeing  personnel  In 
the  lexicography  office  for  more  productive  tasks. 

RECOMMENDATIONS 

The  following  points  should  be  considered  in  developing  test  criteria  for  a 
useful  and  practical  vocabulary  control  and  maintenance  system. 

1.  The  test  should  be  operated  under  a  system  of  direct  access  much  like 
the  ORBIT/ 360,  This  would  require  special  programs  tailored  to 
lexicographic  functions. 

2.  The  test  should  be  operated  by  lexicographic  personnel  m  order  to 
arrive  at  the  best  possible  combination  of  man-machine  specifications. 

3.  Consideration  should  be  given  to  all  possible  types  of  Interacting 
devices,  such  as  display  units,  consoles,  and  teletypes.  As  a 
special  part  of  the  teat,  displays  could  ba  given  to  indexers  and 
analysts  for  the  purposes  of  screening  and  ssisctlon  (but  not  updating) 
of  the  meat  current  thesaurus. 

A.  Special  consideration  should  be  given  to  the  "Compose"  or  Report 

Generator  function  of  cm  AUTOLEX  project.  This  could  very  well  be  the 
most  important  feature  of  such  a  maintenance  system.  The  ability  to 
change  a  report  format  without  outside  programming  support,  would 
provide  a  degree  of  flexibility  to  vocabulary  maintenance  operations 
that  would  extend  present  capabilities  significantly.  For  example, 
studies  could  be  conducted  of  the  indexing  and  retrieval  effectiveness 
of  terms;  term  utilization  in  profiles;  term  inter-relationships  based 
on  profiles,  indexing,  and  searching;  and  experimental  term  display 
formats . 
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APPENDIX  II 

COMMENTS  ON  THE  CIRC  THESAURUS 


The  expansion  of  the  vocabulary  in  the  subject  fields  of  Interest  to  MID,  MIO, 
STIC,  and  DIAST-2C  brought  about  specific  recommendations  for  changes  In  the 
organization  of  the  CIRC  Thesaurus  dated  31  December  1968.  In  addition, 
recoomendatlons  were  made  for  the  deletion  of  terms  that  were  believed  to  be 
redundant,  overly  specific.  Insufficiently  defined,  or  otherwise  unsuitable 
for  inclusion  in  the  DOD  S&T  Intelligence  Thesaurus.  Apart  from  these  recom¬ 
mendations,  a  conscious  effort  was  maintained  to  adhere  to  the  principles  of 
term  selection,  term  construction,  and  the  display  of  term  relationships  that 
had  been  set  forth  in  the  CIRC  Lexicographic  Guide  (FB-DA(L)-150/020/02)  and 
that  were  evidenced  in  the  CIRC  Thesaurus.  In  this  sense,  the  compilation  of 
the  S&T  Thesaurus  was  not  regarded  as  wholesale  revision  of  the  CIRC  Thesaurus. 

As  the  work  progressed,  it  became  clear  that  the  CIRC  Thesaurus  philosophy 
exerts  certain  constraints  upon  vocabulary  development,  particularly  upon 
vocabulary  display.  The  purpose  of  this  discussion  is  to  set  forth  in  some 
detail  the  nature  of  these  constraints  and  to  recommend  ways  in  which  the 
thesaurus  format  can  be  altered  to  provide  an  indexing  and  retrieval  tool  of 
greater  flexibility  and  utility.  This  should  not  be  considered  a  critique 
of  the  existing  system  or  of  the  personnel  involved.  Rather,  it  is  an  ex¬ 
ploration  of  way 8  to  increase  the  capabilities  of  CIRC,  to  amplify  the 
explanation  of  the  CIRC  philosophy  in  the  Lexicographic  Guide  and  in  the  the¬ 
saurus  introduction,  and  to  set  forth  some  considerations  for  other  related 
vocabulary  efforts.  The  latter  could  be  particularly  significant  in  any 
ensuing  thesaurus  compilation  projects. 

TERM  SELECTION 

The  key  to  compiling  a  thesaurus  is  the  determination  of  what  concepts  are  to 
be  rapresented.  It  is  essential  that  considerations  of  format  or  term  display 
not  be  allowed  to  obscure  the  importance  of  content.  Once  a  concept  has  been 
identified,  there  follows  the  determination  of: 

1.  The  term  that  best  connotes  that  concept. 

2.  The  exact  construction  of  the  term. 

3.  The  relationship (s )  of  the  term  to  other  terms  in  the  vocabulary. 

The  most  recent  CIRC  Thesaurus  contained  many  terms  for  which  meanings  could 
not  be  ascertained,  several  cases  of  apparent  synonymy  between  Official  Terms, 
and  a  great  many  specific  terms  connoting  concepts  that  could  be  represented 
by  a  combination  of  more  general  terms.  Moreover,  many  inconsistencies  with 
respect  to  term  construction  were  noted.  It  was  practical  to  rectify  only 
the  most  serious  of  these  discrepancies  while  fulfilling  the  provisions  of 
the  contract,  but  it  seems  likely  that  the  difficulties  brought  about  by 
these  situations  will  be  compounded  as  the  vocabulary  grows  in  size  and  scope. 
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The  exact  causes  for  these  discrepancies  are  not  fully  understood*  but  it 
seems  likely  that  several  factors  contributed.  Among  them  are: 

1.  The  inclusion  in  the  original  CIRC  vocabulary  of  poorly  defined  or 
unnecessary  terms  from  the  Intelligence  Subject  Code  (ISC). 

2.  Workloads  and  publication  deadlines  at  FTD  that  prohibited  adequate 
research  and  consideration  of  candidate  terms. 

3.  An  organizational  structure  of  FTD  which  treats  lexicography  aa  a 
non-technical  function  subordinate  to  indexing  and  analysis. 

4.  An  avers ion  on  Che  part  of  FTD  lexicographers  to  terms  having  high 
posting  densities  or  rapid  Increases  in  pouting  density. 

5.  The  tendency  of  the  update  programe  to  militate  againet  drastic 
changes  In  the  theeourus  structure  and  content. 

At  the  time  the  original  vocabulary  was  organized,  there  was  a  requirement  to 
adhere  to  The  structure  of  the  ISC.  This  gave  rice  to  some  terms  that  were 
coined  to  cover  an  area  or  range  of  ISC  numbers  but  were  not  expressions  of 
defineable  concepts,  lor  ocher  cases*  e  heterogeneous  collection  of  materials 
or  items  wee  covered  hy  a  generic  term  which  wee  not  given  a  Scope  Note. 

)rher  broad  terms  were  incorporated  for  structuring  utility,  but  had  no  caveat 
to  encourage  the  use  of  u.ore  specific  Official  Terms.  As  a  reault,  mary 
overly  general,  poorly  defined  or  vague  terms  became  Ingrained  in  the  vocabu- 
lary  and  have  set  poor  ptecedents  for  term  selection  end  construction.  Ex- 
amp.'  a-..  ere:  SYNIHKTlc  MATERIAL,  NONSTRUCTURAL  MINERAL  PRODUCT,  INTERMOLECULAR 
FORCE,  MATTER  STR-'CrURfc  ,»nJ  TRAJ.’LFfKTATION  STAIUS. 

The  lexicographer*  at  FTD  are  permitted  to  exercise  relatively  little  dis¬ 
cretion  with  respect  to  the  evaluation  of  terms  that  are  submitted  by  indexers, 
analyses,  end  CIRC  users.  This,  coupled  with  heavy  workloads  end  tight 
publication  schedules,  is  believed  to  have  caused  many  terms  of  dubious  in¬ 
dexing  utility  to  be  added  to  the  thesaurus  over  e  period  of  years.  In  addition* 
th--  FTD  policy  Is  to  retain  any  term  that  appears  In  any  CIRC  profile  whether 
or  rot  it  has  ever  actually  been  used  in  Indexing.  This  obviously  makes 
editing  the  thesaurus  difficult. 

For  some  time  the  practice  at  FTD  has  been  to  prohibit  the  use  of  terms  which 
have  a  frequency  of  posting  that  exceeds  a  certain  predetermined  level.  The 
usual  approach  has  been  to  lover  the  frequency  of  posting  by  the  synthesis  of 
many  very  specific  terms  that  connote  various  aspects  of  the  general  concept. 
This  has  produced  s  proliferation  of  specific  terms  in  a  few  areas,  notably 
materials,  chemistry,  and  metallurgy,  that  may  unnecessarily  contribute  to 
difficulties  of  vocabulary  maintenance.  For  example,  of  some  70  terms  re¬ 
lating  to  copper,  many  are  extremely  specific  and  connote  concepts  that  could 
be  represented  by  combinations  of  valid  terms  that  are  more  general. 

These  difficulties  are  compounded  by  the  fact  that  the  existing  maintenance 
programs  are  designed  to  facilitate  the  adding  of  new  terms  to  the  thesaurus. 

SI  .ce  reorganization  of  terms  can  only  be  accompliolied  by  two  updating  runs, 
thi3  has,  not  been  undertaken  too  frequently.  (This  problem  was  dealt  with 
ir  the  current  revision  effort,  but  it  undoubtedly  accounts  for  many  existing 
di  .crepancies.) 
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With  regard  to  the  construction  of  terns,  the  existing  procedures  make 
several  specific  provisions  —  some  of  which  seem  strangely  arbitrary  — 
regarding  term  construction,  but  the  thesaurus  contains  a  great  many  Instances 
in  which  these  provisions  have  not  been  observed.  For  example,  the  rule 
against  plural  forns  properly  provides  that  terns  connoting  scientific  dis¬ 
ciplines  are  not  to  be  considered  plurals;  the  thesaurus  includes  seven  terms 
containing  the  word  "geophyslc"  yet  Includes  still  other  inconsistent  patterns 
as  illustrated  by  terns  such  as  MATHEMATIC  CONFERENCE  and  MATHEMATICS  INSTI¬ 
TUTE.  There  is  an  apparently  arbitrary  rule  against  adjectives  ending  in 
"-al"  which  has  resulted  in  such  awkward  constructions  as  STATISTIC  ANALYSIS 
and  ELECTRIC  ENGINEER.  At  the  same  time,  inconsistencies  such  as  STATISTICAL 
THERMODYNAMICS  and  GLASS  ELECTRICAL  CONDUCTIVITY  have  found  their  way  into  the 
thesaurus.  The  overall  effect  of  these  unsuccessful  efforts  to  systematize 
term  construction  has  been  to  create  terms  that  are  unnecessarily  awkward  and 
synthetic.  This  might  be  better  stated  that  the  Natural  Language  has  become 
quite  unnatural. 

It  should  be  noted  that  there  were  times  when  the  technical  judgments  of  the 
project  were  tried  or  bent  to  maintain  a  consistent  pattern  in  term  construc¬ 
tion.  The  bending  seemed  to  generally  conflict  with  natural  language  as  it 
appears  in  the  text  of  documents  being  processed  or  in  normally  expected 
expressions  for  dissemination  and  retrieval.  These  concerns  stem  from  the 
realisation  that  the  end  use  of  the  thesaurus  is  a  tool,  and  user  requirements 
for  concept  labelling  should  take  precedence  over  lexicographic  preferences 
fer  terms. 

THESAURUS  FORMAT 

Generally,  the  thesaurus  fomat  is  quite  useful  and  compares  favorably  with 
that  of  other  indexing  vocabularies.  There  are,  however,  certain  constraints 
Inherent  in  the  format  which,  if  they  cannot  be  circumvented,  should  be  dealt 
with  much  more  candidly  in  the  Lexicographic  Guide  and  in  the  thesaurus 
introductory  material. 

Alphabetised  Vocabulary 

The  alphabetized  vocabulary  should  be  considered  the  most  important  indexing 
and  retrieval  tool.  Since  this  section  lists  terms  alphabetically ,  it  should 
be  useful  to  users  hsving  any  degree  of  familiarity  with  the  vocabulary. 
Moreover,  the  most  complete  Information  about  the  terms  and  their  disposition 
in  the  other  displays  is  given  here. 

1.  Scope  Notes 

Scope  Note*  have  been  misused  In  a  few  cases  in  the  CIRC  thesaurus; 
l.e.,  they  have  been  used  to  supply  dictionary  definitions,  rather 
than  to  explain  ambiguous  or  closely  overlapping  terms.  Of  more 
Importance  are  the  many  terms  that  convey  no  obvious  meaning  and  for 
which  no  Scope  Notes  have  been  provided. 

2 .  Synonyms 

There  are  relatively  few  actual  synonyms  among  technical  terms,  but, 
there  are  many  instances  in  which  the  concepts  represented  by  sets 
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of  terms  overlap  to  such  an  extent  that  a  single  valid  indexing 
concept  is  represented.  There  are  even  more  instances  in  which 
specific  concepts  can  be  Indexed  by  a  combination  of  more  general 
terms.  When  properly  used,  the  device  of  designating  as  "synonyms" 
certain  sets  of  terms  will  enhance  indexing  and  retrieval  consistency 
and  will  prevent  the  detrimental  proliferation  of  terms. 

FTD  has  recognized  and  rectified  many  instances  of  poorly  chosen 
synonyms,  but  several  still  exist.  In  addition,  there  are  many  cases 
in  which  terms  overlap  to  such  an  extent  that  they  should  be  con* 
sldered  synonyms  for  indexing  and  retrieval  purposes. 

3.  See  Also  references 

There  has  been  no  consistent  use  of  See  Also  references.  The  impor¬ 
tance  of  these  references,  particularly  in  view  of  the  constraints 
Imposed  by  the  assignment  of  terms  to  one  subject  area  and  one  hier¬ 
archy  has  been  underestimated.  Adequate  criteria  for  the  establish¬ 
ment  of  See  Also  references  have  not  been  developed. 

Subject  -  Structured  Vocabulary 

The  subject  structure  was  patterned  closely  after  the  ISC.  This  was  a  re¬ 
quirement  in  the  original  CIRC  vocabulary  and  had  the  advantage  of  helping  to 
insure  complete  coverage  of  the  scientific  portion  of  the  ISC.  It  had  the 
disadvantage  of  imposing  upon  the  CIRC  thesaurus  a  subject  structure  that  was 
in  some  respects  outmoded  and  unduly  arbitrary.  The  questionable  utility  of 
sons  of  the  terminology  of  this  structure  was  noted  above.  Not  surprisingly, 
the  results  of  FTD  building  upon  this  base  has  resulted  in  a  subject  categori¬ 
zation  and  hierarchical  structure  that  <  onfuses  new  users  who  are  unaware  of 
the  originel  criteria. 

1.  Vocabulary  groupings 

The  56  vocabulary  groupings  Include  the  major  fields  of  science  and 
engineering,  a  few  categories  relating  to  military  science  and 
technology,  and  a  few  groups  of  miscellaneous  content.  On  the  sur¬ 
face,  there  appear  to  be  no  significant  shortcomings  in  this  cate¬ 
gorization.  However,  some  problems  have  arisen  in  the  assignment 
of  the  thesaurus  terms  to  these  groups.  These  problems  stem  from: 

a.  The  inherent  arbitrariness  and  rigidity  of  categorization 
schemes . 

b.  The  assignment  of  each  term  to  only  one  category. 

c.  Inconsistent  interpretation  of  the  scope  of  some  categories. 

d.  Failure  in  some  cases  to  determine  and  to  accomodate  user 
requirements . 
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The  subject  structured-vocabulary  muse  be  recognised  as  only  an  adjunct  to 
the  alphabetised  vocabulary.  It  Is  a  device  for  a  display  of  certain  tern 
relationships  and  Its  usefulness  Is  In  direct  proportion  to  the  degree  of 
familiarity  that  a  given  user  has  with  the  thesaurus.  The  arbitrary  nature 
of  subject  categorizations  and  the  arbitrary  decisions  that  are  required  to 
assign  new  terms  to  categories  make  this  display  relatively  useless  to  anyone 
unfamiliar  with  its  arrangement  and  content.  On  the  other  hand,  frequent 
thesaurus  users  can  learn  to  use  the  display  effectively,  no  matter  what  the 
arrangement. 

The  main  objective  of  a  display  of  this  kind  is  to  provide  thesaurus  users, 
i.e.  Indexers  and  searchers,  with  a  quick  reference  to  e  subset  of  the  total 
vocabulary.  To  be  useful,  the  subset  should: 

a.  Encompass  subject  matter  that  will  correspond  In  some  way 
to  that  of  items  being  indexed  or  searches  being  conducted. 

b.  Represent  a  comprehensive  and  coherent  treatment  of  that 
subject  metter> 

c.  Present  a  display  of  a  site  that  can  be  scanned  quickly  and 
easily. 

In  subdividing  a  large  heterogeneous  vocabulary,  such  as  that  of  CIRC,  the 
isolation  of  certain  more  obvious,  desirable,  or  coherent  categories  inevit¬ 
ably  craates  a  residua  of  miscellaneous  terms  that  give  rise  to  some  rather 
awkward  categories.  These  cstagorles  of  miscellaneous  terms  need  not  unduly 
Influence,  or  detract  from,  the  more  desirable  coherent  categories.  For 
instance  miscellaneous  categories,  such  as  Chemical  Products,  Scientific 
Instrument* and  Ocher  Products  and  Equipment  should  not  attract  sets  of  terms 
which  are  closely  or  exclusively  oriented  to  another  subject  area.  Some 
examples  of  "attracted"  terms  which  have  more  logical  alternate  locations 
are:  DRUGS  which  appears  in  the  subject  area  Chemical  Products,  but  from 
some  points  of  view  might  very  well  be  assigned  to  Medical  Sciences;  METEORO¬ 
LOGICAL  INSTRUMENT  which  appears  In  Scientific  Instruments,  but  might  well  be 
assigned  to  Heterology;  end  PHOTOGRAPHIC  EQUIPMENT,  which  by  its  appearance  In 
Ocher  Products  end  Equipment,  is  separated  from  many  closely  related  photo¬ 
graphy  concepts  that  are  found  in  Optica. 

When  terms  can  be  assigned  to  only  one  category,  subjective  judgments  regard¬ 
ing  the  disposition  of  terms  arc  required.  A  given  category  should  be  judged 
primarily  on  the  basis  of  its  value  as  a  display  device  in  relation  to  the 
vocabulary  as  a  whole,  end  onl?  Incidentally  in  relation  to  other  categories. 
An  overall  rationale  and  certain  guide lines  should  be  employed  to  facilitate 
updating,  but  consistency  per  ae  la  not  of  primary  importance.  The  goal 
should  be  to  satisfy  Che  requirements  of  a  majority  of  thesaurus  users. 

2.  Hierarchical  relationships 

The  hierarchical  relationships  dlsplayad  in  the  subject-structured 
section  and,  by  means  of  BT-NT  cross  references;  In  the  alphabetized 
vocabulary  are  quite  valuable  in  that  thay  provide  a  degree  of 
organization  to  the  vocabulary  as  a  whole  and  implied  definitions  to 
individual  terms.  Searches  conducted  by  means  of  tha  hierarchies 
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can  contribute  to  the  effectiveness  of  the  retrieval  system. 

The  hierarchies  in  the  CIRC  Thesaurus  have  been  developed  from 

various  points  of  view  and  with  varying  degrees  of  consistency.  * 

Serious  constraints  are  imposed  by  the  assignment  of  terms  to  only 
one  hierarchy  and  from  the  limitation  on  the  number  of  levels  a 
hierarchy  may  contain. 

A  simple  example  of  inconsistency  in  the  development  of  hierarchies 
in  the  CIRC  thesaurus  is  the  treatment  of  the  terms  ASTRONOMY  and 
ASTROPHYSICS  in  Subject  Area  06.  Both  terms  are  level  one,  implying 
that  they  are  considered  separate  fields  of  study,  bwt  the  term 
ASTROPHYSICS  CONFERENCE  appears  subordinate  to  ASTRONOMIC  CONFERENCE, 
implying  that  astrophysics  is  a  branch  of  astronomy.  Further, 

PRACTICAL  ASTRONOMY  and  DESCRIPTIVE  ASTRONOMY  appear  in  separate 
hierarchies  from  ASTRONOMY,  but  ASTRONOMIC  GEODESICS  ia  subordinate 
to  ASTRONOMY.  ASTRONOMIC  OBSERVATION  is  subordinate  to  ASTRONOMIC 
GOEDESICS,  but  presumably  could  apply  equally  well  to  practical  or 
descriptive  astronomy. 

The  difficulty  that  arises  from  assigning  terms  to  only  one  hierarchy 
is  shown  by  the  terms  subordinate  to  PHOTOGRAPHIC  ASTRONOMY.  These 
terms,  LUNAR  PHOTOGRAPHY,  PLANETARY  PHOTOGRAPHY,  SOLAR  PHOTOGRAPHY 
and  STELLAR  PHOTOGRAPHY,  would  be  equally  appropriate  as  members  of 
the  PHOTOGRAPHY  hierarchy  that  appears  in  field  10.  A  hierarchical 
search  of  the  term  PHOTOGRAPHY  would  yield  only  some  of  the  appropri¬ 
ate  references  in  the  file. 

PERMUTED  VOCABULARY 

This  is  an  excellent  means  of  displaying  the  thesaurus  vocabulary.  It  can  be 
seen  that  consistent  construction  of  analogous  terms  is  important  since 
similarities  or  coincidences  in  the  words  that  fora  the  terms  provide  the 
basis  for  showing  one  kind  of  relationship  among  terms. 

RECOMMENDATIONS 

1.  A  review  should  be  made  of  the  lexicography  function  at  FTD  with  a  view 
toward  elevating  its  relative  position  to  one  of  responsibility  for  the 
technical  content  as  well  as  other  aspects  of  the  vocabulary. 

2.  A  review  should  be  made  of  the  thesaurus  format  with  emphasis  on: 

a.  Recognition  of  the  alphabetized  section  as  the  most  important 
part  of  the  thesaurus  and  the  other  displays  as  adjuncts  to  it. 

b.  An  objective  assessment  of  the  validity  of  the  existing  subject 
categorization  and  its  responsiveness  to  user  requirements* 

c.  A  feasibility  study  of  multiple  hierarchical  assignment  of  terms.  4 
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d.  The  Introduction  of,  instead  of  the  SY  reference  for  synonyms, 

an  indexing  instruction  that  would  prescribe  the  use  of  one,  two, 
or  perhaps  more  terms  to  index  a  given  concept. 

e.  The  establishment  and  implementation  of  criteria  for  developing 
See  Also  references  among  terms. 

3.  A  reevaluation  of  thesaurus  terminology  should  be  conducted  based  on  the 
utilit  of  the  terms  in  indexing  and  retrieval.  Term?  that  have  not  been 
used  should  be  deleted.  Terms  that  have  been  used  very  infrequently  in 
the  past  three  or  so  years  should  be  considered  likely  candidates  for 
deletion  or  for  being  made  synonyms  of  valid  terms.  Terms  for  which  no 
meaning  is  obvious  of  readily  available,  as  in  reference  work,  should  be 
investigated.  These  that  are  valid,  e.g.  in  terms  of  reflecting  usage 
terminology  and  unique  concepts,  should  be  given  3cope  Notes.  Others 
should  be  deleted  or  made  synonyms  of  valid  terms. 

4.  Practical  conventions  for  term  construction  should  be  established  and 
applied  consistently. 

5.  The  problem  of  "overposted"  terms  should  be  reassessed.  Valid  terms  that 
have  been  posted  so  frequently  that  they  are  no  longer  useful  in  retrieval 
should  be  so  designated  in  the  thesaurus  to  discourage  their  use  when 
suitable  alternatives  are  available. 
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A  description  Is  presented  of  the  comp list ion  of  *  controlled  vocabulary  for  subject 
Indexing  scientific  and  technical  information  within  the  DOD  intelligence  ooraaunity. 

The  product  of  the  effort  was  a  computer-generated  thesaurus  comprising  some  22,5UO 
terms  which  corresponds  closely  to  the  CIRC  Thesaurus  (FTb-TM-22-07-69)  with  regard 
to  format  and  term  construction.  The  final  thesaurus  was  compiled  after  preparing  foul 
mlcrotheoauri  for  other  activities  in  the  Intelligence  coMunlty  which  were  then 
integrated  vlth  the  CIRC  thesaurus.  The  DOD  SAT  Intelligence  Thesaurus  Is  divided  Intv 
three  parte:  a  subject-structured  display  in  which  the  terms  are  hierarchically  re¬ 
lated  within  56  broad  subject  areas;  a  permuted  display  in  which  the  single  -  and 
multi-word  terms  are  arranged  by  computer  In  the  order  of  each  word  that  appears  with¬ 
in  the  terms;  and  an  alphabetical  display  in  vhich  the  terms  are  listed  alphabetically 
vlth  cross  references  among  subject-related  or  hierarchically  related  terms  and  to 
synonyms  where  applicable ,  and  vlth  scope  notes  to  indicate  the  intended  usage  of 
ambiguous  terms.  Appended  to  the  report  are  discussions  of  the  existing  procedures 
for  thesaurus  maintenance  at  CIRC  and  of  some  considerations  concerning  the  ohllosophy 
of  the  CIRC  thesaurus,  with  recommendations  regarding  both.  Also  appended  Is  a  de¬ 
scription  of  a  preliminary  study  of  thesaurus  malntsnsnce  using  on-line  data  process¬ 
ing  techniques. 


